Building a Forecaster using AutoMLx

by the Oracle AutoMLx Team


AutoMLx Forecasting Demo version 23.2.0.

Copyright © 2023, Oracle and/or its affiliates.

Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/

Overview of this Notebook¶

In this notebook we will build a forecaster using the Oracle AutoMLx tool for three real-world datasets. We explore the various options available in the Oracle AutoMLx Forecasting module, allowing the user to control the AutoML training process. We finally evaluate the statistical forecasting algorithms using in-built visualization tools. Note that contrary to other tasks like Classification, Regression or Anomaly detection, the AutoML package does not yet support explainability for forecasting tasks.


Prerequisites¶

  • Experience level: Novice (Python and Machine Learning)
  • Professional experience: Some industry exprience

Business Use¶

Forecasting uses historical time series data as input to make informed estimates of future trends. Learning accurate statistical forecasting model requires expertise in data science and statistics. This process typically comprises of:

  • Preprocess dataset (clean, impute, engineer features, normalize).
  • Pick an appropriate model for the given dataset and prediction task at hand.
  • Tune the chosen model’s hyperparameters for the given dataset.

These steps are significantly time consuming and heavily rely on data scientist expertise. Unfortunately, to make this problem harder, the best feature subset, model, and hyperparameter choice widely varies with the dataset and the prediction task. Hence, there is no one-size-fits-all solution to achieve reasonably good model performance. Using a simple Python API, AutoML can quickly (faster) jump-start the datascience process with an accurately-tuned model and appropriate features for a given prediction task.

Table of Contents¶

  • 0. Setup
  • 1. Univariate time series
    • 1.1. Load the M4 Forecasting Competition dataset
    • 1.2. Split data into train and test for the forecasting task
    • 1.3. Set the engine and deprecation warnings
    • 1.4. Create an instance of Oracle AutoMLx
    • 1.5. Train a forecasting model using AutoMLx
    • 1.6. Generate and visualize forecasts
    • 1.7. Analyze the AutoML optimization process
      • 1.7.1 Algorithm Selection
      • 1.7.2 Hyperparameter Tuning
    • 1.8. Load the Airline Dataset
    • 1.9. Specify a different score metric for optimization
    • 1.10. Specify the number of cross-validation (CV) folds
  • 2. Multivariate time series
    • 2.1. Generate the data
    • 2.2. Train a model using Oracle AutoMLx
    • 2.3. Make predictions
    • 2.4. Visualization
  • References

Setup¶

Basic setup for the Notebook.

In [1]:
! pip install automlx[viz]

%matplotlib inline
%load_ext autoreload
%autoreload 2
Requirement already satisfied: automlx[viz] in /scratch_user/ypushak/automl-3/automl/package (23.1.1)
Requirement already satisfied: aif360==0.5.0 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from automlx[viz]) (0.5.0)
Requirement already satisfied: catboost==1.1 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from automlx[viz]) (1.1)
Requirement already satisfied: category-encoders==2.5.0 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from automlx[viz]) (2.5.0)
Requirement already satisfied: dask[complete]==2022.02.1 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from automlx[viz]) (2022.2.1)
Requirement already satisfied: dice-ml==0.9 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from automlx[viz]) (0.9)
Requirement already satisfied: imbalanced-learn==0.9.1 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from automlx[viz]) (0.9.1)
Requirement already satisfied: lightgbm==3.3.2 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from automlx[viz]) (3.3.2)
Requirement already satisfied: numpy==1.22.2 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from automlx[viz]) (1.22.2)
Collecting onnx==1.12.0
  Using cached onnx-1.12.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.1 MB)
Requirement already satisfied: onnxmltools==1.11.1 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from automlx[viz]) (1.11.1)
Requirement already satisfied: onnxruntime==1.12.1 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from automlx[viz]) (1.12.1)
Requirement already satisfied: pandas==1.4.1 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from automlx[viz]) (1.4.1)
Requirement already satisfied: psutil==5.9.0 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from automlx[viz]) (5.9.0)
Requirement already satisfied: pyod==0.9.7 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from automlx[viz]) (0.9.7)
Requirement already satisfied: pytorch-tabnet==4.0.0 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from automlx[viz]) (4.0)
Requirement already satisfied: scikit-learn==1.1.1 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from automlx[viz]) (1.1.1)
Requirement already satisfied: scipy==1.8.1 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from automlx[viz]) (1.8.1)
Requirement already satisfied: skl2onnx==1.13 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from automlx[viz]) (1.13)
Requirement already satisfied: sktime==0.13.2 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from automlx[viz]) (0.13.2)
Requirement already satisfied: statsmodels==0.13.2 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from automlx[viz]) (0.13.2)
Requirement already satisfied: torch==1.13.1 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from automlx[viz]) (1.13.1)
Requirement already satisfied: xgboost==1.6.1 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from automlx[viz]) (1.6.1)
Requirement already satisfied: protobuf<4.0.0 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from automlx[viz]) (3.20.3)
Requirement already satisfied: ipython==8.4.0 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from automlx[viz]) (8.4.0)
Requirement already satisfied: ipywidgets==8.0.2 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from automlx[viz]) (8.0.2)
Requirement already satisfied: plotly==5.4.0 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from automlx[viz]) (5.4.0)
Requirement already satisfied: matplotlib in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from aif360==0.5.0->automlx[viz]) (3.5.1)
Requirement already satisfied: graphviz in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from catboost==1.1->automlx[viz]) (0.20.1)
Requirement already satisfied: six in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from catboost==1.1->automlx[viz]) (1.16.0)
Requirement already satisfied: patsy>=0.5.1 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from category-encoders==2.5.0->automlx[viz]) (0.5.2)
Requirement already satisfied: packaging>=20.0 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from dask[complete]==2022.02.1->automlx[viz]) (21.3)
Requirement already satisfied: partd>=0.3.10 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from dask[complete]==2022.02.1->automlx[viz]) (1.3.0)
Requirement already satisfied: toolz>=0.8.2 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from dask[complete]==2022.02.1->automlx[viz]) (0.12.0)
Requirement already satisfied: fsspec>=0.6.0 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from dask[complete]==2022.02.1->automlx[viz]) (2022.11.0)
Requirement already satisfied: cloudpickle>=1.1.1 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from dask[complete]==2022.02.1->automlx[viz]) (2.0.0)
Requirement already satisfied: pyyaml>=5.3.1 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from dask[complete]==2022.02.1->automlx[viz]) (5.4.1)
Requirement already satisfied: bokeh>=2.1.1 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from dask[complete]==2022.02.1->automlx[viz]) (3.0.3)
Requirement already satisfied: distributed==2022.02.1 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from dask[complete]==2022.02.1->automlx[viz]) (2022.2.1)
Requirement already satisfied: jinja2 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from dask[complete]==2022.02.1->automlx[viz]) (2.11.3)
Requirement already satisfied: jsonschema in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from dice-ml==0.9->automlx[viz]) (3.2.0)
Requirement already satisfied: tqdm in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from dice-ml==0.9->automlx[viz]) (4.60.0)
Requirement already satisfied: h5py in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from dice-ml==0.9->automlx[viz]) (3.7.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from imbalanced-learn==0.9.1->automlx[viz]) (3.1.0)
Requirement already satisfied: joblib>=1.0.0 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from imbalanced-learn==0.9.1->automlx[viz]) (1.1.0)
Requirement already satisfied: traitlets>=5 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from ipython==8.4.0->automlx[viz]) (5.2.2.post1)
Requirement already satisfied: matplotlib-inline in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from ipython==8.4.0->automlx[viz]) (0.1.3)
Requirement already satisfied: pygments>=2.4.0 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from ipython==8.4.0->automlx[viz]) (2.9.0)
Requirement already satisfied: jedi>=0.16 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from ipython==8.4.0->automlx[viz]) (0.18.0)
Requirement already satisfied: pickleshare in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from ipython==8.4.0->automlx[viz]) (0.7.5)
Requirement already satisfied: stack-data in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from ipython==8.4.0->automlx[viz]) (0.2.0)
Requirement already satisfied: pexpect>4.3 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from ipython==8.4.0->automlx[viz]) (4.8.0)
Requirement already satisfied: decorator in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from ipython==8.4.0->automlx[viz]) (5.0.7)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from ipython==8.4.0->automlx[viz]) (3.0.18)
Requirement already satisfied: backcall in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from ipython==8.4.0->automlx[viz]) (0.2.0)
Requirement already satisfied: setuptools>=18.5 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from ipython==8.4.0->automlx[viz]) (62.3.4)
Requirement already satisfied: ipykernel>=4.5.1 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from ipywidgets==8.0.2->automlx[viz]) (6.13.1)
Requirement already satisfied: jupyterlab-widgets~=3.0 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from ipywidgets==8.0.2->automlx[viz]) (3.0.5)
Requirement already satisfied: widgetsnbextension~=4.0 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from ipywidgets==8.0.2->automlx[viz]) (4.0.5)
Requirement already satisfied: wheel in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from lightgbm==3.3.2->automlx[viz]) (0.37.1)
Collecting protobuf<4.0.0
  Using cached protobuf-3.20.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.0 MB)
Requirement already satisfied: typing-extensions>=3.6.2.1 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from onnx==1.12.0->automlx[viz]) (4.3.0)
Requirement already satisfied: coloredlogs in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from onnxruntime==1.12.1->automlx[viz]) (15.0.1)
Requirement already satisfied: flatbuffers in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from onnxruntime==1.12.1->automlx[viz]) (1.12)
Requirement already satisfied: sympy in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from onnxruntime==1.12.1->automlx[viz]) (1.11.1)
Requirement already satisfied: pytz>=2020.1 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from pandas==1.4.1->automlx[viz]) (2022.1)
Requirement already satisfied: python-dateutil>=2.8.1 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from pandas==1.4.1->automlx[viz]) (2.8.2)
Requirement already satisfied: tenacity>=6.2.0 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from plotly==5.4.0->automlx[viz]) (8.0.1)
Requirement already satisfied: numba>=0.35 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from pyod==0.9.7->automlx[viz]) (0.55.2)
Requirement already satisfied: onnxconverter-common>=1.7.0 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from skl2onnx==1.13->automlx[viz]) (1.13.0)
Requirement already satisfied: deprecated>=1.2.13 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from sktime==0.13.2->automlx[viz]) (1.2.13)
Requirement already satisfied: nvidia-cuda-runtime-cu11==11.7.99 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from torch==1.13.1->automlx[viz]) (11.7.99)
Requirement already satisfied: nvidia-cudnn-cu11==8.5.0.96 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from torch==1.13.1->automlx[viz]) (8.5.0.96)
Requirement already satisfied: nvidia-cublas-cu11==11.10.3.66 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from torch==1.13.1->automlx[viz]) (11.10.3.66)
Requirement already satisfied: nvidia-cuda-nvrtc-cu11==11.7.99 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from torch==1.13.1->automlx[viz]) (11.7.99)
Requirement already satisfied: zict>=0.1.3 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from distributed==2022.02.1->dask[complete]==2022.02.1->automlx[viz]) (2.2.0)
Requirement already satisfied: msgpack>=0.6.0 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from distributed==2022.02.1->dask[complete]==2022.02.1->automlx[viz]) (1.0.4)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from distributed==2022.02.1->dask[complete]==2022.02.1->automlx[viz]) (2.4.0)
Requirement already satisfied: tblib>=1.6.0 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from distributed==2022.02.1->dask[complete]==2022.02.1->automlx[viz]) (1.7.0)
Requirement already satisfied: tornado>=6.0.3 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from distributed==2022.02.1->dask[complete]==2022.02.1->automlx[viz]) (6.1)
Requirement already satisfied: click>=6.6 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from distributed==2022.02.1->dask[complete]==2022.02.1->automlx[viz]) (8.0.1)
Requirement already satisfied: pillow>=7.1.0 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from bokeh>=2.1.1->dask[complete]==2022.02.1->automlx[viz]) (9.4.0)
Requirement already satisfied: contourpy>=1 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from bokeh>=2.1.1->dask[complete]==2022.02.1->automlx[viz]) (1.0.7)
Requirement already satisfied: xyzservices>=2021.09.1 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from bokeh>=2.1.1->dask[complete]==2022.02.1->automlx[viz]) (2022.9.0)
Requirement already satisfied: wrapt<2,>=1.10 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from deprecated>=1.2.13->sktime==0.13.2->automlx[viz]) (1.14.1)
Requirement already satisfied: nest-asyncio in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from ipykernel>=4.5.1->ipywidgets==8.0.2->automlx[viz]) (1.5.5)
Requirement already satisfied: debugpy>=1.0 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from ipykernel>=4.5.1->ipywidgets==8.0.2->automlx[viz]) (1.6.0)
Requirement already satisfied: jupyter-client>=6.1.12 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from ipykernel>=4.5.1->ipywidgets==8.0.2->automlx[viz]) (7.3.4)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from jedi>=0.16->ipython==8.4.0->automlx[viz]) (0.8.2)
Requirement already satisfied: MarkupSafe>=0.23 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from jinja2->dask[complete]==2022.02.1->automlx[viz]) (2.0.0)
Requirement already satisfied: llvmlite<0.39,>=0.38.0rc1 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from numba>=0.35->pyod==0.9.7->automlx[viz]) (0.38.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from packaging>=20.0->dask[complete]==2022.02.1->automlx[viz]) (2.4.7)
Requirement already satisfied: locket in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from partd>=0.3.10->dask[complete]==2022.02.1->automlx[viz]) (1.0.0)
Requirement already satisfied: ptyprocess>=0.5 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from pexpect>4.3->ipython==8.4.0->automlx[viz]) (0.7.0)
Requirement already satisfied: wcwidth in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython==8.4.0->automlx[viz]) (0.2.5)
Requirement already satisfied: humanfriendly>=9.1 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from coloredlogs->onnxruntime==1.12.1->automlx[viz]) (10.0)
Requirement already satisfied: pyrsistent>=0.14.0 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from jsonschema->dice-ml==0.9->automlx[viz]) (0.18.1)
Requirement already satisfied: attrs>=17.4.0 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from jsonschema->dice-ml==0.9->automlx[viz]) (21.4.0)
Requirement already satisfied: fonttools>=4.22.0 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from matplotlib->aif360==0.5.0->automlx[viz]) (4.29.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from matplotlib->aif360==0.5.0->automlx[viz]) (1.1.0)
Requirement already satisfied: cycler>=0.10 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from matplotlib->aif360==0.5.0->automlx[viz]) (0.10.0)
Requirement already satisfied: executing in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from stack-data->ipython==8.4.0->automlx[viz]) (0.8.3)
Requirement already satisfied: asttokens in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from stack-data->ipython==8.4.0->automlx[viz]) (2.0.5)
Requirement already satisfied: pure-eval in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from stack-data->ipython==8.4.0->automlx[viz]) (0.2.2)
Requirement already satisfied: mpmath>=0.19 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from sympy->onnxruntime==1.12.1->automlx[viz]) (1.3.0)
Requirement already satisfied: jupyter-core>=4.9.2 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from jupyter-client>=6.1.12->ipykernel>=4.5.1->ipywidgets==8.0.2->automlx[viz]) (4.10.0)
Requirement already satisfied: entrypoints in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from jupyter-client>=6.1.12->ipykernel>=4.5.1->ipywidgets==8.0.2->automlx[viz]) (0.3)
Requirement already satisfied: pyzmq>=23.0 in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from jupyter-client>=6.1.12->ipykernel>=4.5.1->ipywidgets==8.0.2->automlx[viz]) (23.1.0)
Requirement already satisfied: heapdict in /scratch_user/ypushak/automl-3/automl/py_3.8.7/lib/python3.8/site-packages (from zict>=0.1.3->distributed==2022.02.1->dask[complete]==2022.02.1->automlx[viz]) (1.0.1)
Installing collected packages: protobuf, onnx
  Attempting uninstall: protobuf
    Found existing installation: protobuf 3.20.3
    Uninstalling protobuf-3.20.3:
      Successfully uninstalled protobuf-3.20.3
  Attempting uninstall: onnx
    Found existing installation: onnx 1.13.1
    Uninstalling onnx-1.13.1:
      Successfully uninstalled onnx-1.13.1
Successfully installed onnx-1.12.0 protobuf-3.20.1

[notice] A new release of pip available: 22.2.2 -> 23.0.1
[notice] To update, run: pip install --upgrade pip

Load the required modules.

In [2]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
import seaborn as sns
import gzip
import matplotlib.pyplot as plt
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.datasets import load_airline

plt.rcParams['figure.figsize'] = [15, 5]
plt.rcParams['font.size'] = 15
sns.set(color_codes=True)
sns.set(font_scale=1.5)
sns.set_palette("bright")
sns.set_style("whitegrid")

import automl
from automl import init
from automl.interface.utils import plot_forecast

Univariate time series¶

The Oracle AutoMLx solution for forecasting can process both univariate and multivariate time series. We start by displaying an example of use for univariate time series, and will adress multivariate data at the end of this notebook.

Load the M4 Forecasting Competition dataset¶

We fetch a univariate timeseries from the repository of the M4 forecasting competition.

In [3]:
m4_url = "https://github.com/Mcompetitions/M4-methods/raw/master/Dataset/Train/Weekly-train.csv"
m4_metadata_url = "https://github.com/Mcompetitions/M4-methods/raw/master/Dataset/M4-info.csv"

all_series = pd.read_csv(m4_url, index_col=0)  # consists of thousands of series
metadata_csv = pd.read_csv(m4_metadata_url, index_col=0)  # describes their datetime index

We select a series from the finance sector with weekly collection frequency. M4 dataset requires additional preprocessing to reconstruct the timeseries.

In [4]:
series_id = 'W142'
series_metadata = metadata_csv.loc[series_id]
series_values = all_series.loc[series_id]

# drop NaNs for the time period where data wasn't recorded
series_values.dropna(inplace=True)

# retrieve starting date of recording and series length to generate the datetimeindex
start_date = pd.to_datetime(series_metadata.StartingDate)
future_dates = pd.date_range(start=start_date,
                             periods=len(series_values),
                             freq='W', closed=None)
y = pd.DataFrame(series_values.to_numpy(),
                 index=future_dates,
                 columns=[(series_metadata.category+"_"+series_id)])

We can now visualize the last 200 weeks of data we have on hand.

In [5]:
y = y.tail(n=200)  # approximately 4 years of data
y.plot(ylabel='Weekly Series '+series_id, grid=True)
Out[5]:
<AxesSubplot:ylabel='Weekly Series W142'>

One must ensure that the data points are in a Pandas DataFrame, sorted in chronological order.

In [6]:
print(y.index)
print("Time Index is", "" if y.index.is_monotonic else "NOT", "monotonic.")
print("Train datatype", type(y))
DatetimeIndex(['2012-12-09 12:00:00', '2012-12-16 12:00:00',
               '2012-12-23 12:00:00', '2012-12-30 12:00:00',
               '2013-01-06 12:00:00', '2013-01-13 12:00:00',
               '2013-01-20 12:00:00', '2013-01-27 12:00:00',
               '2013-02-03 12:00:00', '2013-02-10 12:00:00',
               ...
               '2016-07-31 12:00:00', '2016-08-07 12:00:00',
               '2016-08-14 12:00:00', '2016-08-21 12:00:00',
               '2016-08-28 12:00:00', '2016-09-04 12:00:00',
               '2016-09-11 12:00:00', '2016-09-18 12:00:00',
               '2016-09-25 12:00:00', '2016-10-02 12:00:00'],
              dtype='datetime64[ns]', length=200, freq='W-SUN')
Time Index is  monotonic.
Train datatype <class 'pandas.core.frame.DataFrame'>

Split data into train and test for the forecasting task¶

As can be seen above, the data contains 200 weekly recorded values over the past 5 years. We will try to predict electricity consumption for the last 0.5 year of data (26 data points), using the previous years as training data. Hence, we separate the dataset into training and testing sets using Temporal train-test split, which ensures that the continuity of the input time series is preserved. Each point in the series represents a month, so we will hold out the last 26 points as test data.

In [7]:
y_train, y_test = temporal_train_test_split(y, test_size=26)
print("Training length: ", len(y_train)," Testing length: ", len(y_test))
Training length:  174  Testing length:  26

We see that the train data ranges from December 2012 to April 2016, while the test data ranges from April to October 2016.

In [8]:
print("y_train", y_train)
print("\ny_test", y_test)
y_train                      Finance_W142
2012-12-09 12:00:00      4793.269
2012-12-16 12:00:00      4818.969
2012-12-23 12:00:00      4863.783
2012-12-30 12:00:00      4926.357
2013-01-06 12:00:00      4916.616
...                           ...
2016-03-06 12:00:00      5024.759
2016-03-13 12:00:00      5021.153
2016-03-20 12:00:00      4996.524
2016-03-27 12:00:00      5004.032
2016-04-03 12:00:00      5053.536

[174 rows x 1 columns]

y_test                      Finance_W142
2016-04-10 12:00:00      5039.544
2016-04-17 12:00:00      5053.420
2016-04-24 12:00:00      5066.860
2016-05-01 12:00:00      5099.887
2016-05-08 12:00:00      5082.291
2016-05-15 12:00:00      5084.340
2016-05-22 12:00:00      5081.651
2016-05-29 12:00:00      5106.797
2016-06-05 12:00:00      5147.906
2016-06-12 12:00:00      5140.993
2016-06-19 12:00:00      5153.210
2016-06-26 12:00:00      5258.369
2016-07-03 12:00:00      5307.455
2016-07-10 12:00:00      5295.631
2016-07-17 12:00:00      5300.088
2016-07-24 12:00:00      5295.456
2016-07-31 12:00:00      5333.739
2016-08-07 12:00:00      5338.803
2016-08-14 12:00:00      5339.390
2016-08-21 12:00:00      5345.043
2016-08-28 12:00:00      5369.168
2016-09-04 12:00:00      5401.186
2016-09-11 12:00:00      5395.298
2016-09-18 12:00:00      5386.552
2016-09-25 12:00:00      5377.376
2016-10-02 12:00:00      5406.685

Setting the engine and deprecation warnings¶

The AutoML pipeline offers the function init, which allows to initialize the parallelization engine.

In [9]:
init(engine='dask')
[2023-03-22 09:30:56,928] [automl.xengine] Using Dask Execution
/scratch_user/ypushak/automl-3/automl/package/automl/interface/default.py:79: AutoMLxDeprecationWarning: engine dask is deprecated and will be removed in version 23.3.0.
  warn(f'engine {engine} is deprecated and will be removed in version 23.3.0.', AutoMLxDeprecationWarning)
/scratch_user/ypushak/automl-3/automl/package/automl/interface/default.py:82: AutoMLxDeprecationWarning: engine option dask_scheduler is deprecated and will be removed in version 23.3.0
  warn(f'engine option {engine_opt} is deprecated and will be removed in version 23.3.0', AutoMLxDeprecationWarning)

As you can see from the above deprecation warning, dask and it's related configuration options are deprecated, and will be removed in AutoMLx version 23.3.0. This, along with a few other API changes, are coming in 23.3.0. By default, the AutoMLx package is configured to display deprecation warnings for all such changes. However, they can be disabled for any newly-created AutoMLx objects via init.

We will also switch to the 'local' parallelization engine, which uses python's multiprocessing library for parallelism instead.

In [10]:
init(engine='local', check_deprecation_warnings=False)
[2023-03-22 09:30:58,823] [automl.interface] Execution engine (local) has already been initialized. Reinitializing!
[2023-03-22 09:30:59,234] [automl.xengine] Local ProcessPool execution (n_jobs=40)

Create an instance of Oracle AutoMLx¶

The Oracle AutoMLx solution automatically provides a tuned forecasting pipeline that best models the given training dataset and a prediction task at hand. Here the dataset can be any univariate time-series.

AutoML for Forecasting consists of three main modules:

  • Preprocessing
    • Impute any missing values using back fill or forward fill mechanisms to ensure input has a well-defined and consistent frequency.
    • Identify seasonalities present in the data by detrending and analyzing the Autocorrelation Function (ACF) of the series.
    • Decide appropriate number of cross-validation (CV) folds and the forecast horizons based on the datetime frequency of data.
  • Algorithm Selection: Identify the right algorithm for a given dataset, choosing from the following:
    • NaiveForecaster - Naive and Seasonal Naive method
    • ThetaForecaster - Equivalent to Simple Exponential Smoothing (SES) with drift
    • ExpSmoothForecaster - Holt-Winters' damped method
    • STLwESForecaster - Seasonal Trend LOESS (locally weighted smoothing) with Exponential Smoothing substructure
    • STLwARIMAForecaster - Seasonal Trend LOESS (locally weighted smoothing) with ARIMA substructure
    • SARIMAForecaster - Seasonal Autoregressive Integrated Moving Average
    • ETSForecaster - Error, Trend, Seasonality (ETS) Statespace Exponential Smoothing
    • ProphetForecaster (optional) - Facebook Prophet. Only available if installed locally with pip install fbprophet
    • OrbitForecaster (optional) - Uber Orbit model with Exogenous Variables. (Available if a supported version is installed)
    • VARMAXForecaster - Vector AutoRegressive Moving Average with Exogenous Variables (Available for multivariate datasets)
    • DynFactorForecaster - Dynamic Factor Models in state-space form with Exogenous Variables (Available for multivariate datasets)
  • Hyperparameter Tuning
    • Find the right model parameters that maximize score for the given dataset.

These pieces are readily combined into a simple AutoML pipeline which automates the entire forecasting process with minimal user input/interaction. One can then evaluate and visualize the forecast produced by the selected model, and optionally the other tuned models.

Train a forecasting model using Oracle AutoMLx¶

The AutoML API is quite simple to work with. We first create an instance of the pipeline. Next, the training data is passed to the fit() function which successively executes the previously mentioned modules.

The generated model can then be used for forecasting tasks. By default, we use the negative of symmetric mean absolute percentage error (sMAPE) scoring metric to evaluate the model performance. The parameter n_algos_tuned sets the number of algorithms whose hyperparameters are fully tuned. For highest accuracy results, it is recommended to set this value to >=2 and preferably to 8 such that all models are fully tuned.

In [11]:
est1 = automl.Pipeline(task='forecasting', n_algos_tuned=4)
est1.fit(X=None, y=y_train)

print('Selected model: {}'.format(est1.selected_model_))
print('Selected model params: {}'.format(est1.selected_model_params_))
[2023-03-22 09:30:59,718] [automl.pipeline] Random state (7) is used for model builds
[2023-03-22 09:30:59,964] [automl.pipeline] Forecast horizon set to 13 for validation sets.
[2023-03-22 09:30:59,967] [automl.pipeline] Dataset shape: (174, 1)
[2023-03-22 09:31:00,016] [automl.pipeline] Running Auto-Preprocessing
[2023-03-22 09:31:00,039] [automl.preprocessing] Number of simple differencing orders required: d = 1
[2023-03-22 09:31:00,041] [automl.preprocessing] Seasonal Periodicities; from decomposed/adjusted: [52, 53, 1] ACF(54-lags):
 [ 1.     0.297  0.204  0.006  0.208  0.042 -0.158 -0.196 -0.039  0.054
 -0.392 -0.276 -0.368  0.112 -0.111 -0.084 -0.051  0.251  0.151 -0.064
 -0.073 -0.002  0.222 -0.119 -0.058 -0.147  0.226 -0.04  -0.07  -0.165
  0.152  0.119 -0.055 -0.051  0.034  0.295 -0.008 -0.078 -0.196  0.074
 -0.244 -0.25  -0.345 -0.028  0.052 -0.154 -0.112 -0.039  0.198  0.063
  0.184  0.178  0.648  0.316  0.155]
[2023-03-22 09:31:00,046] [automl.pipeline] Forecasting centric preprocessing completed. Updated Dataset shape: (174, 1), cv: [(109, 13), (122, 13), (135, 13), (148, 13), (161, 13)]
[2023-03-22 09:31:00,633] [automl.pipeline] VARMAXForecaster and DynFactorForecaster disabled, univariate series found in y.
[2023-03-22 09:31:00,634] [automl.pipeline] Running Model Selection
[2023-03-22 09:31:00,640] [automl.automl] Seasonal Periodicity of micromodels set to 52
[2023-03-22 09:31:00,641] [automl.automl] Differencing order of micromodels set to 1
[2023-03-22 09:31:11,243] [automl.pipeline] Model Selection completed. Selected model: ['ETSForecaster', 'ExpSmoothForecaster', 'SARIMAXForecaster', 'ProphetForecaster']
[2023-03-22 09:31:11,245] [automl.pipeline] Adaptive Sampling Disabled
[2023-03-22 09:31:11,246] [automl.pipeline] Adaptive Sampling Completed. Updated Dataset Shape: (174, 1), Valid Shape: None, CV: [(109, 13), (122, 13), (135, 13), (148, 13), (161, 13)], Class counts: N/A
[2023-03-22 09:31:11,246] [automl.pipeline] Starting Feature Selection 0. Dataset Shape: (174, 1)
[2023-03-22 09:31:11,294] [automl.pipeline] Feature Selection Disabled
[2023-03-22 09:31:11,294] [automl.pipeline] Using all features: Index(['ds'], dtype='object')
[2023-03-22 09:31:11,310] [automl.pipeline] Feature Selection 0 completed. Updated Dataset shape: (174, 1)
[2023-03-22 09:31:11,356] [automl.pipeline] Tuning ETSForecaster
[2023-03-22 09:31:15,825] [automl.pipeline] Tuning completed. Best params: {'damped': False, 'error': 'add', 'seasonal': 'add', 'sp': 52, 'trend': 'add'}
[2023-03-22 09:31:15,911] [automl.pipeline] Starting Feature Selection 1. Dataset Shape: (174, 1)
[2023-03-22 09:31:15,955] [automl.pipeline] Feature Selection Disabled
[2023-03-22 09:31:15,955] [automl.pipeline] Using all features: Index(['ds'], dtype='object')
[2023-03-22 09:31:15,970] [automl.pipeline] Feature Selection 1 completed. Updated Dataset shape: (174, 1)
[2023-03-22 09:31:16,012] [automl.pipeline] Tuning ExpSmoothForecaster
[2023-03-22 09:31:17,177] [automl.pipeline] Tuning completed. Best params: {'damped': True, 'seasonal': 'add', 'sp': 52, 'trend': 'add'}
[2023-03-22 09:31:17,228] [automl.pipeline] Starting Feature Selection 2. Dataset Shape: (174, 1)
[2023-03-22 09:31:17,270] [automl.pipeline] Feature Selection Disabled
[2023-03-22 09:31:17,271] [automl.pipeline] Using all features: Index(['ds'], dtype='object')
[2023-03-22 09:31:17,286] [automl.pipeline] Feature Selection 2 completed. Updated Dataset shape: (174, 1)
[2023-03-22 09:31:17,328] [automl.pipeline] Tuning SARIMAXForecaster
[2023-03-22 09:31:57,951] [automl.pipeline] Tuning completed. Best params: {'D': 1, 'P': 0, 'Q': 0, 'd': 1, 'p': 2, 'q': 0, 'sp': 52, 'trend': 'n', 'use_X': False}
[2023-03-22 09:31:58,042] [automl.pipeline] Starting Feature Selection 3. Dataset Shape: (174, 1)
[2023-03-22 09:31:58,085] [automl.pipeline] Feature Selection Disabled
[2023-03-22 09:31:58,086] [automl.pipeline] Using all features: Index(['ds'], dtype='object')
[2023-03-22 09:31:58,101] [automl.pipeline] Feature Selection 3 completed. Updated Dataset shape: (174, 1)
[2023-03-22 09:31:58,143] [automl.pipeline] Tuning ProphetForecaster
[2023-03-22 09:32:03,723] [automl.pipeline] Tuning completed. Best params: {'changepoint_prior_scale': 0.5, 'seasonality_mode': 'additive', 'seasonality_prior_scale': 2.5, 'use_X': False}
[2023-03-22 09:32:09,809] [automl.pipeline] (Re)fitting Pipeline
09:32:11 - cmdstanpy - INFO - Chain [1] start processing
09:32:12 - cmdstanpy - INFO - Chain [1] done processing
[2023-03-22 09:32:12,388] [automl.xengine] Local ProcessPool execution (n_jobs=40)
[2023-03-22 09:32:13,523] [automl.pipeline] AutoML completed. Time taken - 66.063 sec
Selected model: SARIMAXForecaster
Selected model params: {'sp': 52, 'p': 2, 'd': 1, 'q': 0, 'P': 0, 'D': 1, 'Q': 0, 'trend': 'n', 'use_X': False}

The selected model params indicate a good fit at sp (seasonal periodicity) of 52, which typically corresponds to a yearly seasonality for data that is weekly collected (i.e., there are 52 weeks in a year).

Generating and visualizing forecasts¶

There are two interfaces that support generating future forecasts using the trained forecasting pipeline. The preferred function, forecast(), accepts a user-input value for the number of periods to forecast into the future, i.e., relative to the end of the training series. It also accepts a significance level to generate prediction confidence intervals (CIs). When the methods support intervals, confidence intervals at 1-alpha are generated, e.g., significance level alpha=0.05 generates 95% confidence intervals.

In [12]:
summary_frame = est1.forecast(periods=len(y_test), alpha=0.05)
print(summary_frame)
                     Finance_W142  Finance_W142_ci_lower  \
2016-04-10 12:00:00   5050.711267            5025.195828   
2016-04-17 12:00:00   5060.723710            5027.887420   
2016-04-24 12:00:00   5073.265895            5035.283368   
2016-05-01 12:00:00   5113.431872            5070.274142   
2016-05-08 12:00:00   5096.908417            5049.526924   
2016-05-15 12:00:00   5108.503351            5057.050036   
2016-05-22 12:00:00   5090.035728            5035.100362   
2016-05-29 12:00:00   5138.713959            5079.783797   
2016-06-05 12:00:00   5145.946553            5083.656684   
2016-06-12 12:00:00   5140.876189            5075.540845   
2016-06-19 12:00:00   5138.573977            5070.306056   
2016-06-26 12:00:00   5167.098835            5095.600033   
2016-07-03 12:00:00   5191.130730            5116.546419   
2016-07-10 12:00:00   5178.201886            5101.157087   
2016-07-17 12:00:00   5169.101318            5089.640013   
2016-07-24 12:00:00   5173.478593            5091.477242   
2016-07-31 12:00:00   5223.737332            5138.517418   
2016-08-07 12:00:00   5218.141108            5130.661634   
2016-08-14 12:00:00   5229.770021            5139.802714   
2016-08-21 12:00:00   5229.333556            5137.139520   
2016-08-28 12:00:00   5257.574451            5162.690782   
2016-09-04 12:00:00   5269.632467            5172.385363   
2016-09-11 12:00:00   5268.939381            5169.607132   
2016-09-18 12:00:00   5247.280175            5146.311383   
2016-09-25 12:00:00   5249.500365            5146.484903   
2016-10-02 12:00:00   5274.110708            5168.638511   

                     Finance_W142_ci_upper  
2016-04-10 12:00:00            5076.356260  
2016-04-17 12:00:00            5093.774447  
2016-04-24 12:00:00            5111.534933  
2016-05-01 12:00:00            5156.956954  
2016-05-08 12:00:00            5144.734505  
2016-05-15 12:00:00            5160.480178  
2016-05-22 12:00:00            5145.570461  
2016-05-29 12:00:00            5198.327762  
2016-06-05 12:00:00            5208.999654  
2016-06-12 12:00:00            5207.052562  
2016-06-19 12:00:00            5207.761069  
2016-06-26 12:00:00            5239.600866  
2016-07-03 12:00:00            5266.802256  
2016-07-10 12:00:00            5256.410317  
2016-07-17 12:00:00            5249.803193  
2016-07-24 12:00:00            5256.800618  
2016-07-31 12:00:00            5310.370570  
2016-08-07 12:00:00            5307.112127  
2016-08-14 12:00:00            5321.312110  
2016-08-21 12:00:00            5323.182148  
2016-08-28 12:00:00            5354.201950  
2016-09-04 12:00:00            5368.707924  
2016-09-11 12:00:00            5370.180254  
2016-09-18 12:00:00            5350.229928  
2016-09-25 12:00:00            5354.577840  
2016-10-02 12:00:00            5381.735178  

The predict(X) interface supports absolute index-based forecasts, but does not support confidence intervals (CIs). It also downcasts the index to int64.

It should be utilized only when you want to get both in-sample predictions (predictions at timestamps that are part of the training set) and out-of-sample predictions (predictions at timestamps that are not in the training set). The X variable should be an empty dataframe containing only the requested timestamps as index. Here we request 5 in-sample model fit values and 5 out-of-sample forecasts.

In [13]:
future_index = y_train.index[-5:].union(y_test.index[:5])
print(future_index)
DatetimeIndex(['2016-03-06 12:00:00', '2016-03-13 12:00:00',
               '2016-03-20 12:00:00', '2016-03-27 12:00:00',
               '2016-04-03 12:00:00', '2016-04-10 12:00:00',
               '2016-04-17 12:00:00', '2016-04-24 12:00:00',
               '2016-05-01 12:00:00', '2016-05-08 12:00:00'],
              dtype='datetime64[ns]', freq='W-SUN')
In [14]:
est1.predict(X=pd.DataFrame(index=future_index))
Out[14]:
Finance_W142
169 5017.740161
170 4995.171601
171 4989.895874
172 5011.014084
173 5055.154138
174 5050.711267
175 5060.723710
176 5073.265895
177 5113.431872
178 5096.908417

AutoML provides a simple one-line tool to visualize forecasts and confidence intervals.

In [15]:
automl.interface.utils.plot_forecast(fitted_pipeline=est1, summary_frame=summary_frame, 
                                           additional_frames=dict(y_test=y_test)) 

Analyze the AutoML optimization process¶

During the AutoML process, a summary of the optimization process is logged. It consists of:

  • Information about the training data
  • Information about the AutoML Pipeline, such as:
    • selected features that AutoML found to be most predictive in the training data;
    • selected algorithm that was the best choice for this data;
    • hyperparameters for the selected algorithm.

AutoML provides a print_summary() API to output all the different trials performed.

In [16]:
est1.print_summary()
Training Dataset size (174, 1)
Validation Dataset size None
CV [([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...], [109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121]), ([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...], [122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134]), ([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...], [135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147]), ([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...], [148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160]), ([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...], [161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173])]
Optimization Metric neg_sym_mean_abs_percent_error
Selected Features Index(['ds'], dtype='object')
Selected Algorithm SARIMAXForecaster
Time taken 63.8078
Selected Hyperparameters {'sp': 52, 'p': 2, 'd': 1, 'q': 0, 'P': 0, 'D': 1, 'Q': 0, 'trend': 'n', 'use_X': False}
AutoML version 23.1.1
Python version 3.8.7 (default, Aug 25 2022, 13:59:56) \n[GCC 8.5.0 20210514 (Red Hat 8.5.0-10.1.0.1)]
Algorithm #Samples #Features Mean Validation Score Hyperparameters CPU Time Memory Usage (GB)
SARIMAXForecaster_HT 174 1 -0.0233 {'D': 1, 'P': 0, 'Q': 0, 'd': 1, 'p': 2, 'q': 0, 'sp': 52, 'trend': 'n', 'use_X': False} 2.6745 (0.0, None)
SARIMAXForecaster_HT 174 1 -0.0234 {'D': 1, 'P': 0, 'Q': 0, 'd': 1, 'p': 2, 'q': 3, 'sp': 52, 'trend': 'n', 'use_X': False} 13.4186 (0.0, None)
SARIMAXForecaster_HT 174 1 -0.0244 {'D': 1, 'P': 0, 'Q': 1, 'd': 1, 'p': 2, 'q': 3, 'sp': 52, 'trend': 'n', 'use_X': False} 78.3846 (0.0, None)
SARIMAXForecaster_HT 174 1 -0.0246 {'D': 1, 'P': 0, 'Q': 1, 'd': 1, 'p': 2, 'q': 1, 'sp': 52, 'trend': 'n', 'use_X': False} 48.2760 (0.0, None)
SARIMAXForecaster_HT 174 1 -0.0247 {'D': 1, 'P': 0, 'Q': 1, 'd': 1, 'p': 2, 'q': 0, 'sp': 52, 'trend': 'n', 'use_X': False} 21.3202 (0.0, None)
... ... ... ... ... ... ...
ExpSmoothForecaster_HT 174 1 -0.2974 {'damped': False, 'seasonal': 'None', 'sp': 52, 'trend': 'add'} 0.1776 (0.0, None)
NaiveForecaster_AS 174 1 -0.4447 {'sp': 52} 0.0316 0.0
ExpSmoothForecaster_HT 174 1 -inf {'damped': False, 'seasonal': 'add', 'sp': 1, 'trend': 'add'} 0.0084 (0.0, None)
ETSForecaster_HT 174 1 -inf {'damped': True, 'error': 'add', 'seasonal': 'add', 'sp': 1, 'trend': 'None'} 0.0058 (0.0, None)
ETSForecaster_HT 174 1 -inf {'damped': False, 'error': 'add', 'seasonal': 'add', 'sp': 1, 'trend': 'add'} 0.0106 (0.0, None)

We also provide the capability to visualize the results of each stage of the AutoML pipeline.

Algorithm Selection¶

The plot below shows the scores predicted by Algorithm Selection for each algorithm. Since negative sMAPE is used by default, higher values (closer to zero) are better. The horizontal line shows the average score across all algorithms. Algorithms with better score than average are colored turquoise, whereas those with worse score than average are colored teal. Here we can see that the SARIMAXForecaster algorithm achieved the best predicted score (orange bar), and is chosen for subsequent stages of the Pipeline.

In [17]:
# Each trial is a tuple of
# (algorithm, no. samples, no. features, mean CV score, hyperparameters, 
# all CV scores, total CV time (s), memory usage (Gb))
trials = est1.model_selection_trials_ 
scores = [x[3] for x in trials]
models = [x[0] for x in trials]
y_margin = 0.10 * (max(scores) - min(scores))
s = pd.Series(scores, index=models).sort_values(ascending=False)

colors = []
for f in s.keys():
    if f == '{}_AS'.format(est1.selected_model_):
        colors.append('orange')
    elif s[f] >= s.mean():
        colors.append('teal')
    else:
        colors.append('turquoise')
        

fig, ax = plt.subplots(1)
ax.set_title("Algorithm Selection Trials")
ax.set_ylim(min(scores) - y_margin, max(scores) + y_margin)
ax.set_ylabel(est1.inferred_score_metric[0])
s.plot.bar(ax=ax, color=colors, edgecolor='black')
ax.axhline(y=s.mean(), color='black', linewidth=0.5)
plt.show()

Hyperparameter Tuning¶

Hyperparameter tuning is the last stage of the Oracle AutoMLx pipeline, and focuses on improving the chosen algorithm's score. We use a novel algorithm to search across many hyperparameter dimensions, and converge automatically when optimal hyperparameters are identified. Each trial in the graph below represents a particular hyperparameter combination for the selected model.

In [18]:
# Each trial is a tuple of
# (algorithm, no. samples, no. features, mean CV score, hyperparameters, 
# all CV scores, total CV time (s), memory usage (Gb))
trials = np.array(est1.tuning_trials_)
scores = np.array([x[3] for x in reversed(trials)])
finite_indexes = np.where(np.isfinite(scores))

trials = trials[finite_indexes]
scores = scores[finite_indexes]
y_margin = 0.10 * (max(scores) - min(scores))


fig, ax = plt.subplots(1)
ax.set_title("Hyperparameter Tuning Trials")
ax.set_xlabel("Iteration $n$")
ax.set_ylabel(est1.inferred_score_metric[0])
ax.grid(color='g', linestyle='-', linewidth=0.1)
ax.set_ylim(min(scores) - y_margin, max(scores) + y_margin)
ax.plot(range(1, len(trials) + 1), scores, 'k:', marker="s", color='teal', markersize=3)
plt.show()

We can also view all tuned algorithms, as well as their validation and testing performance. This provides a good sanity check for the decision making.

In [19]:
print(f'### Plotting is enabled for total models tuned = {len(est1.pipelines_)}.')
print('### Model_name\t\t Val_score\t Test_score ')
for i in range(0, len(est1.pipelines_)):
    pipe_ = est1.pipelines_[i]
    print(  pipe_.__dict__['selected_model_'] , " \t",
            '%.4f'%pipe_.k_results[pipe_.__dict__['selected_model_']]['best_score'],'\t',   # validation_score
            '%.4f'%pipe_.score(pd.DataFrame(index=y_test.index), y=y_test))            # testing_score
    summary_frame = pipe_.forecast(len(y_test), alpha=0.05)                     # out-of-sample forecast
    fig = automl.interface.utils.plot_forecast(fitted_pipeline=pipe_, summary_frame=summary_frame, 
                                               additional_frames=dict(y_text=y_test))
    fig.show()
### Plotting is enabled for total models tuned = 4.
### Model_name		 Val_score	 Test_score 
SARIMAXForecaster  	 -0.0233 	 -0.0141
ExpSmoothForecaster  	 -0.0279 	 -0.0161
ETSForecaster  	 -0.0280 	 -0.0192
ProphetForecaster  	 -0.0445 	 -0.0141

Load the Airline Dataset¶

The Airline Passenger univariate series represents the monthly total number of international airline passengers (in thousands) from January 1949 to December 1960. To showcase AutoML's functionality in the absence of a datetime index, we drop the datetime index and utilize the series with only an int64index.

In [20]:
yair = pd.DataFrame(load_airline()) # Input must be a pd.DataFrame type
yair.index = np.arange(0, len(yair)) # replace the datetime index with an integer index.
yair.plot(ylabel='Number of Airline Passengers', grid=True)
Out[20]:
<AxesSubplot:ylabel='Number of Airline Passengers'>

We will forecast the last 20% of data, using the previous years as training data.

In [21]:
yair_train, yair_test = temporal_train_test_split(yair, test_size=0.2)
print("Training length: ", len(yair_train)," Testing length: ", len(yair_test))
Training length:  115  Testing length:  29

Specify a different score metric for Oracle AutoMLx optimization¶

The pipeline tries to maximize a given score metric, by looking at different methods and hyperparameter choices. By default, the score metric is set to negative of sMAPE. The user can also choose another metric. For the forecasting task, possible metrics are: 'neg_sym_mean_abs_percent_error', 'neg_root_mean_squared_percent_error', 'neg_mean_abs_scaled_error', 'neg_root_mean_squared_error', 'neg_mean_squared_error', 'neg_max_absolute_error', 'neg_mean_absolute_error' is accepted.

Here, we ask AutoML to optimize for MASE ('neg_mean_abs_scaled_error'), a scale-invariant scoring metric.

In [22]:
est2 = automl.Pipeline(task='forecasting', n_algos_tuned=3, score_metric='neg_mean_abs_scaled_error')
est2.fit(y=yair_train)

test_score = automl.models.score.time_series_loss(est2, X=pd.DataFrame(index=yair_test.index), 
                                                  y=yair_test, scoring='neg_mean_abs_scaled_error')
print('Selected model: {}'.format(est2.selected_model_))
print('Selected model params: {}'.format(est2.selected_model_params_))
print(f'Score on test data : {test_score}')
[2023-03-22 09:32:21,135] [automl.pipeline] Random state (7) is used for model builds
[2023-03-22 09:32:21,362] [automl.pipeline] Dataset shape: (115, 1)
[2023-03-22 09:32:21,405] [automl.pipeline] Running Auto-Preprocessing
[2023-03-22 09:32:21,423] [automl.preprocessing] Number of simple differencing orders required: d = 2
[2023-03-22 09:32:21,425] [automl.preprocessing] Seasonal Periodicities; from decomposed/adjusted: [12, 24, 1] ACF(54-lags):
 [ 1.     0.173 -0.147 -0.094 -0.309 -0.076  0.093 -0.13  -0.353 -0.115
 -0.149  0.175  0.812  0.175 -0.158 -0.036 -0.269 -0.031  0.073 -0.13
 -0.319 -0.111 -0.107  0.166  0.681  0.144 -0.135 -0.024 -0.193 -0.031
  0.058 -0.134 -0.268 -0.123 -0.065  0.118  0.597  0.126 -0.129 -0.003
 -0.132 -0.026  0.036 -0.114 -0.259 -0.09  -0.044  0.09   0.505  0.125
 -0.137  0.004 -0.078 -0.004  0.044]
[2023-03-22 09:32:21,428] [automl.pipeline] Forecasting centric preprocessing completed. Updated Dataset shape: (115, 1), cv: [(55, 30), (85, 30)]
[2023-03-22 09:32:21,484] [automl.pipeline] ProphetForecaster and OrbitForecaster disabled, no datetime index provided.
[2023-03-22 09:32:21,484] [automl.pipeline] VARMAXForecaster and DynFactorForecaster disabled, univariate series found in y.
[2023-03-22 09:32:21,485] [automl.pipeline] Running Model Selection
[2023-03-22 09:32:21,486] [automl.automl] Seasonal Periodicity of micromodels set to 12
[2023-03-22 09:32:21,486] [automl.automl] Differencing order of micromodels set to 2
[2023-03-22 09:32:22,268] [automl.pipeline] Model Selection completed. Selected model: ['STLwESForecaster', 'SARIMAXForecaster', 'STLwARIMAForecaster']
[2023-03-22 09:32:22,269] [automl.pipeline] Adaptive Sampling Disabled
[2023-03-22 09:32:22,269] [automl.pipeline] Adaptive Sampling Completed. Updated Dataset Shape: (115, 1), Valid Shape: None, CV: [(55, 30), (85, 30)], Class counts: N/A
[2023-03-22 09:32:22,270] [automl.pipeline] Starting Feature Selection 0. Dataset Shape: (115, 1)
[2023-03-22 09:32:22,312] [automl.pipeline] Feature Selection Disabled
[2023-03-22 09:32:22,313] [automl.pipeline] Using all features: Index(['ds'], dtype='object')
[2023-03-22 09:32:22,327] [automl.pipeline] Feature Selection 0 completed. Updated Dataset shape: (115, 1)
[2023-03-22 09:32:22,369] [automl.pipeline] Tuning STLwESForecaster
[2023-03-22 09:32:23,032] [automl.pipeline] Tuning completed. Best params: {'es_damped_trend': True, 'es_trend': 'add', 'low_pass_deg': 0, 'seasonal_deg': 0, 'sp': 24, 'trend_deg': 1}
[2023-03-22 09:32:23,081] [automl.pipeline] Starting Feature Selection 1. Dataset Shape: (115, 1)
[2023-03-22 09:32:23,123] [automl.pipeline] Feature Selection Disabled
[2023-03-22 09:32:23,124] [automl.pipeline] Using all features: Index(['ds'], dtype='object')
[2023-03-22 09:32:23,138] [automl.pipeline] Feature Selection 1 completed. Updated Dataset shape: (115, 1)
[2023-03-22 09:32:23,179] [automl.pipeline] Tuning SARIMAXForecaster
[2023-03-22 09:32:24,721] [automl.pipeline] Tuning completed. Best params: {'D': 0, 'P': 1, 'Q': 1, 'd': 2, 'p': 2, 'q': 2, 'sp': 12, 'trend': 'n', 'use_X': False}
[2023-03-22 09:32:24,770] [automl.pipeline] Starting Feature Selection 2. Dataset Shape: (115, 1)
[2023-03-22 09:32:24,811] [automl.pipeline] Feature Selection Disabled
[2023-03-22 09:32:24,812] [automl.pipeline] Using all features: Index(['ds'], dtype='object')
[2023-03-22 09:32:24,827] [automl.pipeline] Feature Selection 2 completed. Updated Dataset shape: (115, 1)
[2023-03-22 09:32:24,868] [automl.pipeline] Tuning STLwARIMAForecaster
[2023-03-22 09:32:25,638] [automl.pipeline] Tuning completed. Best params: {'arima_d': 2, 'arima_p': 2, 'arima_q': 2, 'low_pass_deg': 0, 'seasonal_deg': 1, 'sp': 12, 'trend_deg': 0}
[2023-03-22 09:32:28,754] [automl.pipeline] (Re)fitting Pipeline
[2023-03-22 09:32:30,018] [automl.xengine] Local ProcessPool execution (n_jobs=40)
[2023-03-22 09:32:30,642] [automl.pipeline] AutoML completed. Time taken - 5.238 sec
Selected model: STLwARIMAForecaster
Selected model params: {'seasonal_deg': 1, 'trend_deg': 0, 'low_pass_deg': 0, 'sp': 12, 'arima_p': 2, 'arima_d': 2, 'arima_q': 2}
Score on test data : -0.4026405927768431
In [23]:
automl.interface.utils.plot_forecast(fitted_pipeline=est2, summary_frame=est2.forecast(len(yair_test)), 
                                           additional_frames=dict(y_test=yair_test, y_train=yair_train))

Specify the number of cross-validation (CV) folds¶

AutoML automatically decides how many folds to create, given the length of the input series. This is dependent on the frequency and length of the series. In the above, the preprocessor chose to create two folds. In the following we set the number of folds to 8.

In [24]:
est3 = automl.Pipeline(task='forecasting')
est3.fit(y=yair_train, cv=8)

print('Selected model: {}'.format(est3.selected_model_))
print('Selected model params: {}'.format(est3.selected_model_params_))
print(f'Score on test data : {est3.score(pd.DataFrame(index=yair_test.index), y=yair_test)}')

fig = automl.interface.utils.plot_forecast(fitted_pipeline=est3, summary_frame=est3.forecast(len(yair_test)), 
                                           additional_frames=dict(y_test=yair_test))
[2023-03-22 09:32:31,967] [automl.pipeline] Random state (7) is used for model builds
[2023-03-22 09:32:32,214] [automl.pipeline] Dataset shape: (115, 1)
[2023-03-22 09:32:32,269] [automl.pipeline] Running Auto-Preprocessing
[2023-03-22 09:32:32,293] [automl.preprocessing] Number of simple differencing orders required: d = 2
[2023-03-22 09:32:32,296] [automl.preprocessing] Seasonal Periodicities; from decomposed/adjusted: [12, 24, 1] ACF(54-lags):
 [ 1.     0.173 -0.147 -0.094 -0.309 -0.076  0.093 -0.13  -0.353 -0.115
 -0.149  0.175  0.812  0.175 -0.158 -0.036 -0.269 -0.031  0.073 -0.13
 -0.319 -0.111 -0.107  0.166  0.681  0.144 -0.135 -0.024 -0.193 -0.031
  0.058 -0.134 -0.268 -0.123 -0.065  0.118  0.597  0.126 -0.129 -0.003
 -0.132 -0.026  0.036 -0.114 -0.259 -0.09  -0.044  0.09   0.505  0.125
 -0.137  0.004 -0.078 -0.004  0.044]
[2023-03-22 09:32:32,300] [automl.pipeline] Forecasting centric preprocessing completed. Updated Dataset shape: (115, 1), cv: 8
[2023-03-22 09:32:32,369] [automl.pipeline] ProphetForecaster and OrbitForecaster disabled, no datetime index provided.
[2023-03-22 09:32:32,371] [automl.pipeline] VARMAXForecaster and DynFactorForecaster disabled, univariate series found in y.
[2023-03-22 09:32:32,371] [automl.pipeline] Running Model Selection
[2023-03-22 09:32:32,372] [automl.automl] Seasonal Periodicity of micromodels set to 12
[2023-03-22 09:32:32,373] [automl.automl] Differencing order of micromodels set to 2
[2023-03-22 09:32:39,039] [automl.pipeline] Model Selection completed. Selected model: ['STLwARIMAForecaster']
[2023-03-22 09:32:39,040] [automl.pipeline] Adaptive Sampling Disabled
[2023-03-22 09:32:39,041] [automl.pipeline] Adaptive Sampling Completed. Updated Dataset Shape: (115, 1), Valid Shape: None, CV: 8, Class counts: N/A
[2023-03-22 09:32:39,042] [automl.pipeline] Starting Feature Selection 0. Dataset Shape: (115, 1)
[2023-03-22 09:32:39,089] [automl.pipeline] Feature Selection Disabled
[2023-03-22 09:32:39,090] [automl.pipeline] Using all features: Index(['ds'], dtype='object')
[2023-03-22 09:32:39,105] [automl.pipeline] Feature Selection 0 completed. Updated Dataset shape: (115, 1)
[2023-03-22 09:32:39,150] [automl.pipeline] Tuning STLwARIMAForecaster
[2023-03-22 09:32:40,944] [automl.pipeline] Tuning completed. Best params: {'arima_d': 2, 'arima_p': 2, 'arima_q': 2, 'low_pass_deg': 0, 'seasonal_deg': 1, 'sp': 12, 'trend_deg': 1}
[2023-03-22 09:32:43,473] [automl.pipeline] (Re)fitting Pipeline
[2023-03-22 09:32:44,007] [automl.xengine] Local ProcessPool execution (n_jobs=40)
[2023-03-22 09:32:44,195] [automl.pipeline] AutoML completed. Time taken - 9.031 sec
Selected model: STLwARIMAForecaster
Selected model params: {'seasonal_deg': 1, 'trend_deg': 1, 'low_pass_deg': 0, 'sp': 12, 'arima_p': 2, 'arima_d': 2, 'arima_q': 2}
Score on test data : -0.026295793541552835

Multivariate time series¶

Generate the data¶

We now display the use of the Oracle AutoMLx solution for multivariate timeseries. We load the 10-dimensional Lutkepohl2 dataset. We then restrict the data to 4 variables : two exogenous variables (variables that are independent on all other data variables), and two endogenous variables (variables that are dependent on some other data variables). The endogenous variables will be the target predictions of the pipeline, while the exogenous variables will be used solely as explanatory variables.

In [25]:
dta = sm.datasets.webuse('lutkepohl2', 'https://www.stata-press.com/data/r12/')
dta.index = dta.qtr
dta.index.freq = dta.index.inferred_freq
endog = dta.loc['1960-04-01':'1978-10-01', ['dln_inv', 'dln_inc',]]
exog = dta.loc['1960-04-01':'1978-10-01', ['dln_consump']]
exog = sm.add_constant(exog)

We then split it using a temporal train-test split as done previously. Note that $X$ consists of the exogenous variables ('dln_consump' and another constant variable) while $y$ are the target variables ('dln_in' and 'dln_inc')

In [26]:
X_train_df, X_test_df = temporal_train_test_split(exog, train_size=0.9)
y_train_df, y_test_df = temporal_train_test_split(endog, train_size=0.9)

Train a model using Oracle AutoMLx¶

We can now fit the AutoML pipeline. For the multivariate forecasting task, the pipeline only considers two models : VARMAX and DynFactor.

In [27]:
pipeline = automl.Pipeline(task='forecasting',
                           n_algos_tuned=1,
                           score_metric='neg_sym_mean_abs_percent_error')

pipeline.fit(X=X_train_df, y=y_train_df)
[2023-03-22 09:32:46,139] [automl.pipeline] Random state (7) is used for model builds
[2023-03-22 09:32:46,379] [automl.pipeline] Forecast horizon set to 8 for validation sets.
[2023-03-22 09:32:46,380] [automl.pipeline] Dataset shape: (67, 2)
[2023-03-22 09:32:46,433] [automl.pipeline] Running Auto-Preprocessing
[2023-03-22 09:32:46,454] [automl.preprocessing] Number of simple differencing orders required: d = 1
[2023-03-22 09:32:46,457] [automl.preprocessing] Seasonal effects not found. Using datetime periodicity of 4
[2023-03-22 09:32:46,459] [automl.preprocessing] Seasonal Periodicities; from decomposed/adjusted: [4, 1] ACF(54-lags):
 [ 1.    -0.568  0.015 -0.015  0.25  -0.265  0.047  0.149 -0.095 -0.109
  0.092  0.143 -0.203  0.012  0.113 -0.026 -0.177  0.165  0.116 -0.274
  0.119  0.074 -0.051 -0.079  0.087  0.042 -0.175  0.113  0.063 -0.074
 -0.042  0.111 -0.068 -0.026  0.048  0.09  -0.198  0.112 -0.014  0.097
 -0.169  0.097 -0.039  0.055 -0.085  0.068 -0.016 -0.037  0.035 -0.008
  0.036 -0.088  0.12  -0.138  0.1  ]
[2023-03-22 09:32:46,468] [automl.pipeline] Forecasting centric preprocessing completed. Updated Dataset shape: (67, 3), cv: [(43, 8), (51, 8), (59, 8)]
[2023-03-22 09:32:46,562] [automl.pipeline] Multivariate series found in y, only VARMAXForecaster and DynFactorForecaster models apply.
[2023-03-22 09:32:46,563] [automl.pipeline] Running Model Selection
[2023-03-22 09:32:46,564] [automl.automl] Seasonal Periodicity of micromodels set to 4
[2023-03-22 09:32:46,565] [automl.automl] Differencing order of micromodels set to 1
[2023-03-22 09:32:52,605] [automl.pipeline] Model Selection completed. Selected model: ['VARMAXForecaster']
[2023-03-22 09:32:52,606] [automl.pipeline] Adaptive Sampling Disabled
[2023-03-22 09:32:52,608] [automl.pipeline] Adaptive Sampling Completed. Updated Dataset Shape: (67, 3), Valid Shape: None, CV: [(43, 8), (51, 8), (59, 8)], Class counts: N/A
[2023-03-22 09:32:52,608] [automl.pipeline] Starting Feature Selection 0. Dataset Shape: (67, 3)
[2023-03-22 09:32:52,652] [automl.pipeline] Feature Selection Disabled
[2023-03-22 09:32:52,653] [automl.pipeline] Using all features: Index(['ds', 'const', 'dln_consump'], dtype='object')
[2023-03-22 09:32:52,667] [automl.pipeline] Feature Selection 0 completed. Updated Dataset shape: (67, 3)
[2023-03-22 09:32:52,709] [automl.pipeline] Tuning VARMAXForecaster
[2023-03-22 09:32:54,064] [automl.pipeline] Tuning completed. Best params: {'error_cov_type': 'unstructured', 'pq_order': '40', 'trend': 'c', 'use_X': False}
[2023-03-22 09:32:55,247] [automl.pipeline] (Re)fitting Pipeline
[2023-03-22 09:32:56,307] [automl.xengine] Local ProcessPool execution (n_jobs=40)
[2023-03-22 09:32:56,482] [automl.pipeline] AutoML completed. Time taken - 8.458 sec
Out[27]:
Pipeline(model_list=['NaiveForecaster', 'ThetaForecaster',
                     'ExpSmoothForecaster', 'ETSForecaster', 'STLwESForecaster',
                     'STLwARIMAForecaster', 'SARIMAXForecaster',
                     'VARMAXForecaster', 'DynFactorForecaster',
                     'ProphetForecaster'])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(model_list=['NaiveForecaster', 'ThetaForecaster',
                     'ExpSmoothForecaster', 'ETSForecaster', 'STLwESForecaster',
                     'STLwARIMAForecaster', 'SARIMAXForecaster',
                     'VARMAXForecaster', 'DynFactorForecaster',
                     'ProphetForecaster'])

The AutoML pipeline provides attributes to get the selected features, the chosen model, hyperparameters as well as the score on the test set.

In [28]:
print('Selected features: {}'.format(pipeline.selected_features_))
print('Ranked models: {}'.format(pipeline.ranked_models_))
print('Selected model: {}'.format(pipeline.selected_model_))
print('Selected model params: {}'.format(pipeline.selected_model_params_))
Selected features: [0, 1, 2]
Ranked models: ['VARMAXForecaster']
Selected model: VARMAXForecaster
Selected model params: {'pq_order': '40', 'trend': 'c', 'error_cov_type': 'unstructured', 'use_X': False}

Make predictions¶

As mention in the univariate data case, there are two ways of making a prediction :

  • forecast(k) allows one to predict k steps after the end of the training data. It should be used when one wants to make out-of-sample predictions
  • predict(X) returns predictions at the timestamps given as argument. It should be used when one wants to make in-sample predictions and out-of-sample predictions. It does not support confidence intervals.

In the cell below predict() is used on the last 5 timestamps of the train set, and all timestamps of the test set. forecast() is used to predict k steps after the training set, where k is the size of the test set.

In [29]:
y_pred = pipeline.predict(pd.concat([X_train_df[-5:0],X_test_df], axis=0) )
y_forecast = pipeline.forecast(len(y_test_df), alpha=0.8, X=X_test_df)    # out-of-sample forecast

The obtained forecast contains predictions for the two target variables, as well as lower and upper confidence intervals, for each timestamp in the test set.

In [30]:
y_forecast
Out[30]:
dln_inv dln_inc dln_inv_ci_lower dln_inv_ci_upper
1977-01-01 0.013637 0.021205 0.002588 0.024686
1977-04-01 0.020282 0.022736 0.008766 0.031797
1977-07-01 0.007802 0.020105 -0.003724 0.019328
1977-10-01 0.031891 0.021881 0.020207 0.043574
1978-01-01 0.014414 0.021965 0.002362 0.026466
1978-04-01 0.017645 0.021431 0.005427 0.029863
1978-07-01 0.017818 0.021707 0.005579 0.030057
1978-10-01 0.022726 0.021450 0.010445 0.035006

One can also directly compute the score of the tuned model on the test set, without needing to run forecast() or predict().

In [31]:
print("Tuned model testing score (negative sMAE): ", pipeline.score(X=X_test_df, y=y_test_df))
Tuned model testing score (negative sMAE):  -0.5925192193208232

Visualization¶

Finally, when given as input the forecasted variables, the plot_forecast() method displays an interactive plot of the predictions (for each target variable) and confidence intervals.

In [32]:
plot_forecast(fitted_pipeline=pipeline, summary_frame=y_forecast, additional_frames=dict(test=y_test_df))

References¶

  • More examples and details: http://automl.oraclecorp.com/
  • Oracle AutoML: http://www.vldb.org/pvldb/vol13/p3166-yakovlev.pdf
  • sktime: https://www.sktime.org/en/latest/
  • statsmodels: https://www.statsmodels.org/stable/index.html
  • M4 Competition: https://mofc.unic.ac.cy/m4/
  • Airline Dataset: https://www.sktime.org/en/stable/api_reference/auto_generated/sktime.datasets.load_airline.html